Pipeline management tool

ABSTRACT

Systems, methods, and non-transitory computer readable media are provided for managing pipelines of operations on data. A system may access data and provide a set of functions for the data. The system may receive a user&#39;s selection of one or more functions from the set of functions. The system may generate a pipeline of operations for the data based on the user&#39;s selection. The pipeline of operations may include the function(s) selected by the user.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of Ser. No. 17/558,560,filed Dec. 21, 2021, which is a continuation application of U.S. patentapplication Ser. No. 16/056,952, filed Aug. 7, 2018, now U.S. Pat. No.11,221,831, which claims the benefit under 35 U.S.C. § 119(e) of theU.S. Provisional Application Ser. No. 62/543,544 filed Aug. 10, 2017,the content of which is hereby incorporated by reference in itsentirety.

FIELD OF THE INVENTION

This disclosure relates to approaches for managing pipelines ofoperations on data.

BACKGROUND

Under conventional approaches, users may code pipelines that includemultiple operations on data. For example, users may wish to create apipeline that includes multiple modifications/processes on the data.Identifying and integrating previously written operations into newpipelines may be difficult. Users may be required to manually search forpreviously written operations and/or recode the operations. Modifyingorders of operations within the pipeline may be difficult. Additionally,traditional coding tools may not allow users to run individualoperations within the pipeline separately to check the accuracy of thecode/results. Debugging the code of the pipelines and verifyingindividual operations on the data may be time consuming and resourceintensive.

SUMMARY

Various embodiments of the present disclosure may include systems,methods, and non-transitory computer readable media configured to managepipelines of operations on data. A system may access data and provide aset of functions for the data. The system may receive a user's selectionof one or more functions from the set of functions. The system maygenerate a pipeline of operations for the data based on the user'sselection. The pipeline of operations may include the function(s)selected by the user. The pipeline of operations for the data mayinclude a modification operation or a visualization operation on atleast a portion of the data.

In some embodiments, the set of functions for the data may be providedbased on at least a portion of the data. In some embodiments, the set offunctions may provide at least one of a modification operation or avisualization operation for the data. A type of the modificationoperation or the visualization operation may be determined based on atleast a portion of the data.

In some embodiments, providing the set of functions for the data mayinclude suggesting the set of functions. The set of functions may besuggested based on at least a portion of the data or a historical usageof the set of functions.

In some embodiments, providing the set of functions for the data mayinclude displaying a pipeline creation interface. The pipeline creationinterface may enable the user to search for existing functions andcreate new functions.

In some embodiments, the pipeline creation interface may enable the userto (1) view and modify code for a given function, and (2) view a resultof applying the given function on the data.

In some embodiments, the pipeline creation interface enabling the userto create the new functions may include suggesting a data type for avariable of a given new function based on the variable.

In some embodiments, the system may receive the user's grouping ofmultiple functions from the set of functions. The system may generate anew function based on the multiple functions.

In some embodiments, the system may receive a change to a given functionfrom the set of functions. The given function may be used in one or morerelated functions. The system may generate a dependency graph for thechange to the given function. The system may provide informationregarding an impact of the change to the given function on the relatedfunction(s) based on the dependency graph. The system may change therelated function(s) based on the change to the given function.

These and other features of the systems, methods, and non-transitorycomputer readable media disclosed herein, as well as the methods ofoperation and functions of the related elements of structure and thecombination of parts and economies of manufacture, will become moreapparent upon consideration of the following description and theappended claims with reference to the accompanying drawings, all ofwhich form a part of this specification, wherein like reference numeralsdesignate corresponding parts in the various figures. It is to beexpressly understood, however, that the drawings are for purposes ofillustration and description only and are not intended as a definitionof the limits of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

Certain features of various embodiments of the present technology areset forth with particularity in the appended claims. A betterunderstanding of the features and advantages of the technology will beobtained by reference to the following detailed description that setsforth illustrative embodiments, in which the principles of the inventionare utilized, and the accompanying drawings of which:

FIG. 1 illustrates an example environment for managing pipelines ofoperations on data, in accordance with various embodiments

FIG. 2 illustrates an example interface for managing pipelines ofoperations on data, in accordance with various embodiments.

FIG. 3 illustrates an example interface for managing pipelines ofoperations on data, in accordance with various embodiments.

FIG. 4A illustrates an example interface for managing pipelines ofoperations on data, in accordance with various embodiments.

FIG. 4B illustrates an example interface for managing pipelines ofoperations on data, in accordance with various embodiments.

FIG. 5 illustrates a flowchart of an example method, in accordance withvarious embodiments.

FIG. 6 illustrates a block diagram of an example computer system inwhich any of the embodiments described herein may be implemented.

DETAILED DESCRIPTION

A claimed solution rooted in computer technology overcomes problemsspecifically arising in the realm of computer technology. In variousimplementations, a system may access data and provide a set of functionsfor the data. The set of functions for the data may be provided based onone or more criteria (e.g., one or more portions of the data, userinput). The system may receive a user's selection of one or morefunctions from the set of functions. The system may receive the user'sgrouping of multiple functions from the set of function, and generate anew function based on the multiple functions. The system may generate apipeline of operations for the data based on the user's selection. Apipeline of operations may refer to one or more sequences of multipleoperations. A pipeline of operations may include operations (e.g.,functions) that take as their input(s) the output(s) of a prioroperation and/or operations that provide their output(s) as input(s) toa subsequent operation. For example, an operation within a pipeline ofoperation may take as its primary input the output of the previousoperation in the pipeline and/or may provide its output as a primaryinput to the next operation in the pipeline. Outputs of individualoperations may be passed through the pipeline as inputs to the followingoperations. The pipeline of operations may include the function(s)selected by the user. The pipeline of operations for the data mayinclude one or more modification operations and/or one or morevisualization operations on one or more portions of the data.

The set of functions for the data may be provide via a pipeline creationinterface. The pipeline creation interface may enable the user to searchfor existing functions and create new functions. The pipeline creationinterface may enable the user to (1) view and modify code for a givenfunction, and (2) view a result of applying the given function on thedata. A change to a given function from the set of functions may bereceived. The given function may be used in one or more relatedfunctions. A dependency graph for the change to the given function maybe generated. Information regarding an impact of the change to the givenfunction on the related function(s) may be provided based on thedependency graph. The related function(s) may be changed based on thechange to the given function.

The approaches disclosed herein enable management/customization ofpipelines of operations on data. The approaches disclosed herein providean interface that enables users to search for existing functions, createnew functions, modify functions, select functions for operations ondata, see results of applying individual functions on data, modifyselections of functions (e.g., add, remove, change order of functions),and/or create/modify pipelines of operations on data. Impact ofmodifying functions on related function may be provided to users andchanges to functions may be propagated to related functions. One or moreexternal functions (e.g., visualization function) may be provided foruse through the interface, giving users flexibility to use functionsfrom external libraries.

FIG. 1 illustrates an example environment 100 for managing pipelines ofoperations on data, in accordance with various embodiments. The exampleenvironment 100 may include a computing system 102. The computing system102 may include one or more processors and memory (e.g., permanentmemory, temporary memory). The processor(s) may be configured to performvarious operations by interpreting machine-readable instructions storedin the memory. As shown in FIG. 1 , in various embodiments, thecomputing system 102 may include an access engine 112, a function engine114, a selection engine 116, a pipeline engine 118, other engines,and/or other components. The environment 100 may also include one ormore datastores (not shown) that is accessible to the computing system102 (e.g., via one or more network(s)). In some embodiments, thedatastore(s) may include various databases, application functionalities,application/data packages, and/or other data that are available fordownload, installation, and/or execution. While the computing system 102is shown in FIG. 1 as a single entity, this is merely for ease ofreference and is not meant to be limiting. One or morecomponents/functionalities of the computing system 102 described hereinmay be implemented in a single computing device or multiple computingdevices.

In various embodiments, the access engine 112 is configured to accessdata. Accessed data may include data for which one or more operationsare desired. Data may be accessed from one or more storage locations. Astorage location may refer to electronic storage located within thecomputing system 102 (e.g., integral and/or removable memory of thecomputing system 102), electronic storage coupled to the computingsystem 102, and/or electronic storage located remotely from thecomputing system 102 (e.g., electronic storage accessible to thecomputing system 102 through a network). In some embodiments, data maybe stored in one or more databases/datastores. Data may be stored withina single file or across multiple files.

For example, the access engine 112 may access data relating to one ormore groups of vehicles (e.g., consumer vehicles, commercial vehicles,passenger vehicles, delivery vehicles). The data relating to thegroups(s) of vehicles may be organized using one or more particularstructures. For example, the data may be organized as one or moretables/data frames, with values relating to the group(s) of vehiclesbeing stored within a particular location (e.g., row, column) within thetable(s) based on the characteristics to which the values relate. Forexample, information relating to individual trips taken by the vehiclesmay be separated into separate rows (or columns) and individualcharacteristics relating to the trips (e.g., vehicle identifier, vehicleoperator, departure location, current location, destination, averagespeed, maximum speed, current speed, direction of travel, weight,elevation, distance traveled, time traveled, time of trip, fuelconsumed, type/amount of load, sensor readings) may be separated intoseparate columns (or rows). While the disclosure is described hereinwith respect to data relating to group(s) of vehicles, this is merelyfor illustrative purposes and is not meant to be limiting. Othertypes/organizations of data are contemplated.

In various embodiments, the function engine 114 is configured to provideone or more sets of functions for the data. A function may refer to oneor more groupings of code that perform one or more specific operationson data. A set of functions may refer to a grouping of one or morefunctions. Operations on data may include processes that modify the data(e.g., change the data, create new data based on the data, delete thedata, combine the data with other data), processes that visualize thedata (e.g., in a plot, in a table, in a chart, in a map), and/or otheroperations of the data. In some embodiments, functions provided by thefunction engine 114 may be specific to the data (e.g., the type of dataaccessed), the user (e.g., the type of user, user's access level), theuse-case (e.g., project-based functions), and/or other information. Thefunctions provided by the function engine 114 may be selected by usersto generate one or more pipelines of operations on the data. Forexample, the accessed data may be transformed into different shapes ofdata in a single step or in multiple steps based on users' selection offunctions provided by the function engine 114.

In some embodiments, the function engine 114 may provide functionswithin one or more libraries. A library may include a collection offunctions and the function engine 114 may provide the collection offunctions based on availability of the library. In some embodiments, thefunction engine 114 may provide access to external functions. Forexample, visualization functions (e.g., providing visual representationsof data in a plot, in a table, in a chart, in a map) may be provided bythe function engine 114 based on importing the relevant library (orrelevant portion of the library) including the visualization functions.As another example, a given visualization function may be provided bythe function engine 114 based on the function engine 114 sending one ormore portions of the data (the input data) to an externalprocess/library that processes the data according to the visualizationfunction and returns the results of the visualization operation to thefunction engine 114. Providing access to external functions may giveusers greater flexibility in selecting functions for pipelines, mayallow users to use external functions within pipelines without codingthe external functions themselves, and/or may allow users to offload oneor more portions of the pipelines' processes to external resources(e.g., external function, external library, external computing system).

The set(s) of functions for the data may be provided through one or moreinterfaces (e.g., users interface(s), application program interface(s)).For example, a user interface may provide a listing of functionsavailable to operate on the data. The user interface may include asearch field enabling users to search for particular functions. Usersmay use the search field to search for particular functions based onnames of functions, keywords of functions, operations performed byfunctions, and/or other information relating to functions. For example,a set of functions for data may include a Greater Than function thattakes in two arguments. A first argument may define the portion of data(e.g., a row/column, a portion of a row/column) to be compared to athreshold value, and the second argument may define the threshold value.One or both of the arguments may be determined based on one or moreoutputs of a prior operation. Users may find the Greater Than functionfrom the set of functions by entering the term “greater” or similarterms in the search field. Responsive to users' entering of theappropriate term (e.g., “greater”) in the search field, the userinterface may display the Greater Than function, along with otherfunctions that match the entered term with their names, keywords,operations, and/or other information related to the functions.

In some embodiments, the set(s) of functions for the data may beprovided based on one or more portions of the data. The function engine114 may identify the types of operations that may be performed on thedata or portion(s) of the data, and provide the identified functions.For example, a first data may include sufficient information to utilizea histogram visualization operation, while a second data may includesufficient information to utilize a histogram visualization operation, ascatter plot visualization operation, a wind rose chart visualizationoperation, and a map visualization operation. Based on access of thefirst data, the function engine 114 may provide via the interface(s) thehistogram visualization operation. Based on access of the second data,the function engine 114 may provide via the interface(s) the histogramvisualization operation, the scatter plot visualization operation, thewind rose chart visualization operation, and the map visualizationoperation. In some embodiments, the function engine 114 may provideother (general) functions that are applicable across multiple types ofdata, such as a Sort function (e.g., sorting alphabetically/numerically,based on time/date).

As another example, the interface(s) providing the functions may allowusers to select one or more portions of the data (e.g., a particularrow/column, a particular portion of a row/column). The function engine114 may identify the types of operations that may be performed on theselected portion(s) of the data and may provide the identifiedfunctions. For example, the accessed data may include tabular data(information organized into rows and column), with different types ofinformation included in different rows/columns. Based on users'selection of a particular row/column, the function engine 114 mayidentify the types of operations that may be performed on the selectedportion(s) of the data/the types of data, and provide the identifiedfunctions. For example, a tabular data may include a first column ofnumerical values and a second column of string values. Based ondifferent types of data in the first column and the second column, thefunction engine 114 may identify and provide different functions for thefirst column and the second column. For example, based on users'selection of the first column, the function engine 114 may provide viathe interface(s) modification/visualization operation(s) relating tonumerical values (e.g., numerical operators, numerical plots). Based onusers' selection of the second column, the function engine 114 mayprovide via the interface(s) modification/visualization operation(s)relating to string values (e.g., language operators). In someembodiments, the function engine 114 may also other (general) functionsthat are applicable across multiple types of data, such as a Column/RowOperator function (e.g., removing/moving column/row). Provision of othertypes of functions based on portion(s) of data are contemplated.

In some embodiment, the functions identified for data/portion(s) of datamay be tied to one or more characteristics of data types. Referring tothe example above of the first column including numerical values, thefunction engine 114 may identify different functions for the numericalvalues based on the characteristics of the numerical values within theselected portion. For example, based on the numerical values of theselected portion being of a particular measurement standard (e.g.,metric system), an identified function may include a Conversion functionto covert the values to a different measurement standard (e.g., standardsystem). Referring to the example above of the second column includingstring values, the function engine 114 may identify different functionsfor the string values based on the characteristics of the string values.For example, based on the string values of the selected portion being ofa particular language (e.g., English), an identified function mayinclude a Conversion function to covert the values to a differentlanguage (e.g., Spanish). Provision of other types of functions based oncharacteristics of data types are contemplated.

In some embodiments, providing the set(s) of functions for the data mayinclude suggesting the set(s) of functions. Suggesting a set offunctions may include ranking/prioritizing the more likely to be usedfunctions above the less likely to be used functions. The set offunctions may be suggested based on at least a portion of the data or ahistorical usage of the set of functions. For example, the functionengine 114 may list the identified functions in the order ofimportance/likely usage based on the data type within the selectedportion and/or based on frequency of prior usage of given functions(with respect to the accessed data, similar data, similar pipeline). Insome embodiments, the set of functions may be provided with the numberof times the same/similar function has been used for the accesseddata/portion of the data, similar data, and/or similar pipelines.

In some embodiments, providing the set(s) of functions for the data mayinclude suggesting one or more parameters for the set(s) of functions.For example, referring to the example of the Greater Than function,providing the Greater Than function may include suggesting one or moreparameters for the arguments of the Greater Than function. For example,the function engine 114 may suggest one or more particular portions(e.g., rows/columns) which may be selected as the portion of data to becompared to a threshold value and/or may suggest one or more particularvalues to be selected as the threshold value. In some embodiments, theparameters suggested for a given function may change based on users'selection other parameters for the given function. For example,referring to the example of the Greater Than function, the thresholdvalue suggested by the function engine 114 may change based on theportion of the data to which the threshold value will be compared (e.g.,different threshold value suggested based on the compared data beingspeed versus distance traveled). The function engine 114 may suggestdifferent values for the threshold value based on the portion of thedata selected by users for comparison.

In some embodiments, particular functions may be provided/suggestedbased on users' selection of one or more given functions and/or one ormore given parameters for given function(s). For example, a dataset Amay be frequently used with dataset B for a given function, and based onusers' selection of the dataset A to be used for the given function, thefunction engine 114 may suggest the use of the dataset B for anotherargument/parameter of the given function. As another example, a functionB may frequently follow the use of a function A, and based on users'selection of the function A for operation on the data, the functionengine 114 may suggest the use of the function B for operation on thedata. In some embodiments, one or more functions/parameters may besuggested based on ordering of functions/parameters selected by users.

In some embodiments, an interface through which set(s) of functions areprovided may include a pipeline creation interface. The pipelinecreation interface may include one or more features and/or enable one ormore functions of interfaces discussed above. The pipeline creationinterface may provide views (e.g., listings) of functions. The pipelinecreation interface may provide views of functions within one or morelibraries, and may allow users to select/import/export the relevantlibraries. The listing of functions may be used by users to select oneor more functions for operation on data. The pipeline creation interfacemay provide views of code for the functions. For example, based on auser's selection of a given function, the pipeline creation interfacemay enable the user to see the code that have been written to accomplishthe given function. The pipeline creation interface may enable the userto view a result of applying the given function on the data. Thepipeline creation interface may provide views of data before and afterapplication of one or more functions/operations on the data (e.g.,before and after data transformation). Such views may provide previewsof applying the functions on the data and may allow users to runindividual functions/operations to check the accuracy of thecorresponding code/results. The pipeline creation interface may enableusers to modify the code for functions. Thus, the pipeline creationinterface may facilitate users' debugging of code offunctions/operations and verification of individual functions/operationsin an intuitive and timely manner.

In some embodiments, the pipeline creation interface may enable users tosearch for existing functions and create new functions. In someembodiments, the pipeline creation interface may require users to firstsearch for functions before allowing the users to create functions. Forexample, a user may wish to perform a Great Than operation on a portionof data. The user may be required to use a search field (as discussedabove) to first search for a function (e.g., entering “greater” in thesearch field). Responsive to the user's input of value in the searchfield, the function engine 114 may provide a list of matching functionsand provide an option to create a new function. If the user does notfind the desired function among the listed functions, the user may usethe create new function option to code the desired operation. Requiringthe user to search for functions before allowing the user to create newfunction may reduce the likelihood of users recoding existing functions.

In some embodiments, the pipeline creation interface may suggest one ormore data types for variables of a new function and/or a modifiedfunction. A user creating a new function/modifying an existing functionmay provide a variable for the function, and the pipeline creationinterface may suggest one or more datatypes for the variable based onhow the variable is identified in the code. For example, a user may becreating the Greater Than function, and may define a variable for thesecond argument (threshold value) as “Num.” Based on the identificationof the variable as “Num,” the function engine 114 may determine that thevariable desired is likely a numerical value and may set/suggest thedata type of the variable as a number type. This setting of the datatype may be used to determine the portion of the data/user input whichmay be used as the variable when the function is selected for use—thatis the variable being a number type may be used to filter out portion(s)of the data including string values and suggest portion(s) of the dataincluding numerical values, or may be used to request a numerical valuefrom users when running the function.

In some embodiments, the pipeline creation interface may enable users todefine the structure of data which may be used for a given variable. Forexample, for individual variables defined within the code for a givenfunction, the pipeline creation interface may provide options (e.g.,buttons, fields) through which the users may define whether thecorresponding data may be found within a column, a numerical column, astring column, a row, a numerical row, a string row, a box, a numericalbox, a string box, and/or other structures.

In some embodiments, the pipeline creation interface may enable users torestrict or provide flexibility in selection of data/parameters forarguments in a given function when the given function is run. Forexample, the pipeline creation interface may enable users to restrictthe use of a given function to a particular data type (e.g., limited touse on speed data) or a particular value (e.g., a predefined value), ormay enable the data type to be operated on/parameter to be used to beselected at the time of use (e.g., when the function is run, users maybe required to select the data type to be operated on by the function/toselect the value).

In some embodiments, a new function may be generated based on multiplefunctions. For example, the pipeline creation interface may enable usersto select multiple functions (within a library, within a listing offunctions selected by the users) and generate a new function based onthe selected functions. The code for the new function may beautomatically generated based on the code of the selected functions. Thecode for the new function may include raw code of the selected functionsor references to the underlying code of the selected functions. In someembodiments, the raw code of the new functions may be updated based onchanges to the underlying code of the selected functions.

In some embodiments, one or more dependency graphs may be used to keeptrack of changes to a given function and determine the impact of thechange to the given function on related functions. Related functions mayrefer to functions that depend/call on the given function (e.g., thecode of the related functions reference the given function; the code ofthe related functions are updated based on changes to the code of thegiven function). Based on a change to the code of the given function(e.g., received through the pipeline creation interface), a dependencygraph for the change may be generated. The dependency graph may be usedto track how changes to code of a given functions are propagated toother functions. Information regarding the impact of the change to thegiven function on the related function(s) may be provided based on thedependency graph. For example, before changes to the given function arefinalized, the user may be provided with a warning indicating that thechanges to the given function will impact other functions and/oranalysis using the given function/other functions. For example, thewarning may identify the related functions/analysis or providing asummary of the related functions/analysis (e.g., number of relatedfunctions/analysis). In some embodiments, the warning may include linksand/or options that allow users to see a listing of the relatedfunctions that will be affected by the change and/or see how thosechanges in the related functions will change operations on the data (seepreviews of the change in data operations of the related functions).Users may be provided with options to (1) reject the change (e.g.,restore the code of the given function to its prior state), (2) acceptthe change (e.g., overwrite the previous version of the code of thegiven function), and change the related function(s), and/or (3) storethe changed code as a new function. In some embodiments, one or moreversion controls may be used to keep track of different versions offunctions.

In various embodiments, the selection engine 116 is configured toreceive a user's selection of one or more functions from the set(s) offunctions. The user's selection of function(s) may be received throughone or more interfaces (e.g., users interface(s), application programinterface(s)). For example, the selection engine 116 may receive theuser's selection of the function(s) based on the user's interaction witha user interface. The selection engine 116 may receive the user'sselection of one or more functions based on a user's searching forparticular functions (e.g., searching for a given function and selectingone of the listed functions). The selection engine 116 may receive theuser's selection of one or more functions created/modified by the user(e.g., through the pipeline creation interface). The selection engine116 may receive the user's selection of one or more functions providedbased on the data/portion(s) of the data. The selection engine 116 mayreceive the user's selection of a function generated based on acombination of multiple functions. Other selections of functions arecontemplated.

In some embodiments, the selection engine 116 may provide information(e.g., warnings) based on improper/incomplete selection of functionsand/or data for the selected functions. For example, a user may haveselected a particular visualization function to map the routes taken bymultiple vehicles. However, the portions of the data (e.g., columns,rows of data) selected by the user to provide location information forthe visualization operation may not include sufficient information toprovide visualization of the routes of the vehicles. For example, theportions of the data selected by the user may include information on thevehicles' current locations but may not include information on where thevehicles were located in the past. Based on the data not includingsufficient information for the selected function, the selection engine116 may provide a warning that the selected function cannot be performedbecause the function was not provided with sufficient data. In someembodiments, the selection engine 116 may identify the missing data sothat the user may change the data for which the function is applied andselect the needed data for the function. In some embodiments, theselection engine 116 may change the selected function to a similarfunction that is supported by the selected data. For example, based onthe selected data missing information on where vehicles were located inthe past but including information on the vehicles' current locations,the selection engine 116 may change the selected function to avisualization function that maps current locations of the vehicles.

In various embodiments, the pipeline engine 118 is configured togenerate one or more pipelines of operations for the data based on theuser's selection. A pipeline of operations may include one or morefunctions selected by the user. For example, the pipeline of operationsmay include one or more modification operations and/or visualizationoperations on one or more portions of data. The pipeline of operationsmay define an order in which the functions are applied to data. Thepipeline of operations may include a linear pipeline or a branchingpipeline. In some embodiments, the pipeline of operations may bedynamically generated. For example, the pipeline of operations may beupdated/modified when users select a new function for inclusion in thepipeline. The pipeline of operations may be updated/modified when usersremove a function from the pipeline. The pipeline of operations may beupdated/modified when users change the ordering of functions within thepipeline. The pipeline of operations may be exported for use in aproduction pipeline, for analysis, for transformation, and/or otheruses. In some embodiments, exporting pipeline of operations may includeautomatic removal of visualization operations/functions from thepipeline of operations. Visualization operations/functions may help auser to understand/see how different portions of the pipeline ofoperations work, but may not be needed in a workflow. For instance, suchvisualization functions may not be necessary in a production pipelineand may introduce unnecessary costs (e.g., processing power, processingtime, power consumption, memory consumption) into the productionpipeline. Automatic removal of visualization operations/functions may befacilitated by the step-by-step selection of operations/functions ofpipeline as disclosed herein. The pipeline of operations may be modifiedto remove the visualization operations/functions. Automatic removal ofvisualization operations/functions may provide for an export of thepipeline that includes the non-visualization operations/functions withinthe pipeline.

In some embodiments, the pipeline of operations may be displayed on thepipeline creation interface. For example, the functions selected byusers may be displayed within a portion of the pipeline creationinterface, with the functions listed in a given order based on theusers' selections. Users may use the displayed pipeline to make changesto the pipeline and/or the displayed functions. Users may use thedisplayed pipeline to add a new function (to the beginning, to the end,or within the pipeline), remove an existing function from the pipeline,or rearrange the order of the functions within the pipeline. Users mayuse the displayed pipeline to view information regarding the functionswithin the pipeline (e.g., properties of the function,arguments/variables of the functions, code of the functions, datatransformations by the functions) and/or to modify the code of thefunctions within the pipelines.

FIGS. 2-3 illustrate example user interfaces 200, 300 for managingpipeline of operations on data, in accordance with various embodiments.In various embodiments, the user interfaces 200, 300 may be accessedthrough a software application running on a computing device (e.g.,computers, mobile phones, tablets, etc.) that includes one or moreprocessors and memory. For example, the user interfaces 200, 300 may beaccessible through a web browser. In another example, the userinterfaces 200, 300 may be provided through a data analysis application.In yet another example, the user interfaces 200, 300 may be provided asa service over a network (e.g., software as a service). Depending on thecomputing device, the user may be able to interact with the userinterfaces 200, 300 using various input devices (e.g., keyboard, mouse,etc.) and/or touch/gestures. The user interfaces 200, 300 are providedmerely as examples and, naturally, the arrangement and configuration ofsuch user interfaces can vary depending on the implementation. Thus,depending on the implementation, the user interfaces 200, 300 mayinclude additional features and/or alternative features. The userinterfaces 200, 300 may include/enable one or more functionalities ofthe interface(s) described above with respect to the computing system102/components of the computing system 102.

Referring to FIG. 2 , the user interface 200 may include viewing options202, 204, 206. The viewing options 202, 204, 206 may enable users toselect one or more types of view of data. For example, the viewingoption 202 may enable users to view data in table form. The viewingoption 204 may enable users to view data in map form. The viewing option206 may enable users to add one or more types of view of data (e.g.,table view, map view, histogram view, scatter plot view, wind rose chartview) to the user interface 200. One or more of the viewing options,such as the viewing option 204 to view data in map form, may be providedthrough one or more external functions and/or one or more user-definedfunctions. Such viewing options may be similar to an operation/functionselected as part of a pipeline of operations. Such viewing options mayutilize metadata that allows them to be used for visualizationfunctions. A data information section 208 may provide informationrelating to accessed data. For example, the data information section 208may provide information relating to title/name of data, the file path ofthe data, related data, group(s) to which the data belongs,properties/characteristics of data (e.g., number of rows and columns),and/or other information relating to the data.

A pipeline information section 210 may provide information relating tofunctions selected by users for inclusion in one or more pipelines. Forexample, function A 212, function B 214, and function C 216 listed inthe pipeline information section 210 may be the functions selected byusers for inclusion in a pipeline. An add function section 218 of thepipeline information section 210 may enable users to select functionsfor inclusion in the pipeline. For example, users may use a search fieldof the add function section 218 to find the desired function. The addfunction section 218 may list functions available for selection. Thelisted functions may include those functions that match the term(s)entered into the search field and/or functions included within one ormore libraries. In some embodiments, the listing of functions may bearranged in a particular order/ranked to provide suggestions to users onwhich functions to select.

Users may be required to use the search field of the add functionsection 218 before users are allowed to create new functions. Forexample, a user may enter a term “MISC” into the search field. Based onthe search term not returning any hits/based on the listed functions notincluding the user's desired function, the user may interact with theadd function 218 section to create a new function. In some embodiment,the user may create a new function by pressing a certain button (e.g.,ENTER) while in the search field. In some embodiment, the user may bepresented with an interactive option (e.g., button) to engage to createa new function. Requiring the user to search for functions beforeallowing the user to create new function may reduce the likelihood ofusers recoding existing functions. In some embodiments, the name of thenew function may default to the term (e.g., “MISC”) entered into thesearch field. Users may code the new function using a new user interfaceor a part of the user interface 200. For example, users may code the newfunction via a function code section 218.

The pipeline information section 210 may enable users to select one ormore functions to see the results of operation(s) of the selectedfunction on data (via a view section 222). The pipeline informationsection 210 may enable users to select one or more functions to see thecode of the selected function (via the function code section 228).

The pipeline information section 210 may enable users to change apipeline based on interaction with the pipeline information section 210.Users may change the order of functions within the pipeline byinteracting with the functions listed within the pipeline. For example,users may change the order of functions in the pipeline to the functionA 212, the function C 216, and the function B 214 by dragging thefunction C 216 from bottom of the list to a position between thefunction A 212 and the function B 214. Users may similarly move the addfunction section 218 to a given location within the pipeline to add anew function to the pipeline at the given location. Users may remove oneor more of the functions from the pipeline by interacting with thelisted functions (e.g., dragging a given function out of the pipelineinformation section 210, interacting with a “remove” button associatedwith the given function). User may create a function from multiplefunctions listed within the pipeline information section 210. Forexample, users may select two or more of the functions 212, 214, 216 andselect an option to create a new function from the selected functions.

A library information section 220 may provide information relating toavailable libraries/functions of available libraries. The libraryinformation section 220 may provide listing of functions withinlibraries. For example, the library information section 220 may enableusers to select one or more libraries to see the functions within theselected librar(ies). The library information section 220 may enableusers to select which libraries are searched when users use the searchfield of the add function section 218. The library information section220 may enable users to import/export libraries for use.

A view section 222 may provide information on accessed data. The viewsection may provide views of the data based on user selection, such asbased on users' interactions with the viewing options 202, 204, 206. InFIG. 2 , users may have interacted with the viewing option 202 to viewthe data in table form. In some embodiments, one or more types of viewsmay be presented as a default view of data. The table view of data maypresent data in tabular form. For example, the table view of data mayprovide data of different types in different columns (or differentrows). The view section 222 may provide views of results of operation onthe data. For example, responsive to users' selection of a givenfunction in the pipeline information section 210, the view section 222may provide a view of the data transformed based on the operation(s) ofthe given function.

The view section 222 may include options 224. The options 224 may enableusers to select one or more functions, operations, configurations,and/or other settings relating to the corresponding portion of the data.For example, interacting with the option 224 in the column of data TypeD may result in presentation of the portion section 226. The portionsection 226 may display functions, operations, configurations, and/orother settings that may be used/set for the portion of the data in thecolumn Type D. Users may use the search field of the portion section 226to find the desired function, operation, configuration, and/or settings.For example, the portion section 226 may list functions available forselection. The listed functions may include those functions that matchthe term(s) entered into the search field and/or functions that may beapplied to the column of data Type D. Functions that may be applied tothe column of data Type D may include generic functions (applicable tomultiple types of data) or type-specific functions (applicable tospecific types of data). For example, based on the type D includingnumerical values or string values, different functions may be listed inthe portion section 226. As another example, based on the numericalvalues of the column of data Type D being recorded in a particularmeasurement standard (e.g., metric vs standard system), differentfunctions may be listed in the portion section 226. In some embodiments,the listing of functions may be ordered/ranked to provide suggestions tousers on which functions to select.

The function code section 228 may provide views of code for one or morefunctions. The code displayed in the function code section 228 mayinclude code of function(s) selected by users (e.g., via the pipelineinformation section 210/the portion section 226). The function codesection 228 may enable users to create, modify, delete code offunctions.

The function properties section 230 may provide view of properties(e.g., name, icon used to represent the function, parameters of thefunction, such as dataframes, variables of the function, variable types,inputs to function, outputs of function) of one or more functions. Theproperties displayed in the function properties section 230 may includeproperties of function(s) selected by users (e.g., via the pipelineinformation section 210/the portion section 226). The functionproperties section 230 may enable users to create, modify, deleteproperties of functions. For example, the function properties section230 may enable users to specify data types of arguments of the function,may enable users to determine whether to restrict the type of data whichmay use the function, and/or may enable users to specify keywords and/orother characteristics of the function. Presentation and changes in otherproperties of functions/code of functions are contemplated.

The function properties section 230 may include options 232, 234relating to saving/deleting code for functions. For example, based on auser accessing an existing function, the option 232 may indicate thatthe code of the function have been saved in memory/library. Based on theuser modifying the code, the option 232 may indicate an option for theuser to save the modified code. The option 232 may present users withoptions to overwriting the existing code of the function or to create anew function from the modified code. The delete option 234 may enableusers to delete code for a function (e.g., delete exiting code, deletemodified code).

In some embodiments, the user interface 200 may provide one or morewarnings relating to code/properties of functions. For example, a givenfunction changed by a user (via the function code section 228/thefunction properties section 230) may be used/referenced by otherfunctions (related functions). A dependency graph may be generated andused to determine the impact of the change to the given function on therelated functions. Information on the impact of the change to the givenfunction on the related functions may be provided via the user interface200. For example, the function properties section 230 may display thatthe changed code affects related functions (e.g., by identifying thenumber and/or identities of the related functions) and/or analysis inwhich the given function/related functions are used (e.g., byidentifying the number and/or identities of the analysis). As anotherexample, the option 232 may change (e.g., change in shape and/or color)to indicate that saving changes to the code will affect relatedfunctions/analysis.

FIG. 3 illustrates the user interface 300. In some embodiments, the userinterface 300 may be configured to implement some, or all, of thefunctionalities of the user interface 200 as described above. The userinterface 300 may include viewing options 302, 304, 306. The viewingoptions 302, 304, 306 may enable users to select one or more types ofview of data. For example, the viewing option 302 may enable users toview data in table form. The viewing option 304 may enable users to viewdata in map form. The viewing option 306 may enable users to add one ormore types of view of data (e.g., table view, map view, histogram view,scatter plot view, wind rose chart view) to the user interface 300. Thedata information section 308 may provide information relating toaccessed data. For example, the data information section 308 may provideinformation relating to title/name of data, the file path of the data,related data, group(s) to which the data belongs,properties/characteristics of data (e.g., number of rows and columns),and/or other information relating to the data.

A pipeline information section 310 may provide information relating tofunctions selected by users for inclusion in one or more pipelines. Forexample, function A 312, function B 314, and function C 316 listed inthe pipeline information section 310 may be the functions selected byusers for inclusion in a pipeline. An add function section 318 of thepipeline information section 310 may enable users to select functionsfor inclusion in the pipeline. For example, users may use a search fieldof the add function section 318 to find the desired function. The addfunction section 318 may list functions available for selection. Thelisted functions may include those functions that match the term(s)entered into the search field and/or functions included within one ormore libraries. In some embodiments, the listing of functions may bearranged in a particular order/ranked to provide suggestions to users onwhich functions to select.

Users may be required to use the search field of the add functionsection 318 before users are allowed to create new functions. Forexample, a user may enter a term “MISC” into the search field. Based onthe search term not returning any hits/based on the listed functions notincluding the user's desired function, the user may interact with theadd function 318 section to create a new function. In some embodiment,the user may create a new function by pressing a certain button (e.g.,ENTER) while in the search field. In some embodiment, the user may bepresented with an interactive option (e.g., button) to engage to createa new function. Requiring the user to search for functions beforeallowing the user to create new function may reduce the likelihood ofusers recoding existing functions. In some embodiments, the name of thenew function may default to the term (e.g., “MISC”) entered into thesearch field. Users may code the new function using a new user interfaceor a part of the user interface 300. For example, users may code the newfunction via a function code section 328.

The pipeline information section 310 may enable users to select one ormore functions to see the results of operation(s) of the selectedfunction on data (via a view section 322). The pipeline informationsection 310 may enable users to select one or more functions to see thecode of the selected function (via the function code section 328).

The pipeline information section 310 may enable users to change apipeline based on interaction with the pipeline information section 310.Users may change the order of functions within the pipeline byinteracting with the functions listed within the pipeline. For example,users may change the order of functions in the pipeline to the functionA 312, the function C 316, and the function B 314 by dragging thefunction C 316 from bottom of the list to a position between thefunction A 312 and the function B 314. Users may similarly move the addfunction section 318 to a given location within the pipeline to add anew function to the pipeline at the given location. Users may remove oneor more of the functions from the pipeline by interacting with thelisted functions (e.g., dragging a given function out of the pipelineinformation section 310, interacting with a “remove” button associatedwith the given function). User may create a function from multiplefunctions listed within the pipeline information section 310. Forexample, users may select two or more of the functions 312, 314, 316 andselect an option to create a new function from the selected functions.

A library information section 320 may provide information relating toavailable libraries/functions of available libraries. The libraryinformation section 320 may provide listing of functions withinlibraries. For example, the library information section 320 may enableusers to select one or more libraries to see the functions within theselected librar(ies). The library information section 320 may enableusers to select which libraries are searched when users use the searchfield of the add function section 318. The library information section320 may enable users to import/export libraries for use.

A view section 322 may provide information on accessed data. The viewsection may provide views of the data based on user selection, such asbased on uses' interactions with the viewing options 302, 304, 306. Insome embodiments, one or more types of views may be presented as adefault view of data. In FIG. 3 , users may have interacted with theviewing option 304 to view the data in map form. For example, the mapview of the data in the view section 322 may provide views of routes338, 340 taken by two vehicles. The view section 322 may include options342 to change the extent of map presented within the view section 322.Users may interact with the options 342 and/or the map shown within theview section 322 to change how the map is presented. For example, usersmay drag the map to see different portions of the map, or select one ormore of the routes 338, 340 to see information relating to the selectedroute(s).

In some embodiments, one or more types of views of data may be providedthrough one or more external functions. For example, responsive tousers' selection of the viewing option 304, an external mapping functionmay be used to generate the map view shown in the view section 322. Agiven external functions may be used by importing the given externalfunction, importing a library/portion of the library including the givenexternal function, and/or providing inputs to the external function(e.g., the external library, external process) and receiving outputs ofthe external function.

The function code section 328 may provide views of code for one or morefunctions. The code displayed in the function code section 328 mayinclude code of function(s) selected by users (e.g., via the pipelineinformation section 310/the portion section 326). The function codesection 328 may enable users to create, modify, delete code offunctions.

The function properties section 330 may provide view of properties(e.g., name, icon used to represent the function, parameters of thefunction, such as dataframes, variables of the function, variable types,inputs to function, outputs of function) of one or more functions. Theproperties displayed in the function properties section 330 may includeproperties of function(s) selected by users (e.g., via the pipelineinformation section 310/the portion section 326). The functionproperties section 330 may enable users to create, modify, deleteproperties of functions. For example, the function properties section330 may enable users to specify data types of arguments of the function,may enable users to determine whether to restrict the type of data whichmay use the function, and/or may enable users to specify keywords and/orother characteristics of the function. Presentation and changes in otherproperties of functions/code of functions are contemplated.

The function properties section 330 may include options 332, 334relating to saving/deleting code for functions. For example, based on auser accessing an existing function, the option 332 may indicate thatthe code of the function have been saved in memory/library. Based on theuser modifying the code, the option 332 may indicate an option for theuser to save the modified code. The option 332 may present users withoptions to overwriting the existing code of the function or to create anew function from the modified code. The delete option 334 may enableusers to delete code for a function (e.g., delete exiting code, deletemodified code).

In some embodiments, the user interface 300 may provide one or morewarnings relating to code/properties of functions. For example, a givenfunction changed by a user (via the function code section 328/thefunction properties section 330) may be used/referenced by otherfunctions (related functions). A dependency graph may be generated andused to determine the impact of the change to the given function on therelated functions. Information on the impact of the change to the givenfunction on the related functions may be provided via the user interface300. For example, the function properties section 330 may display thatthe changed code affects related functions (e.g., by identifying thenumber and/or identities of the related functions) and/or analysis inwhich the given function/related functions are used (e.g., byidentifying the number and/or identities of the analysis). As anotherexample, the option 332 may change (e.g., change in shape and/or color)to indicate that saving changes to the code will affect relatedfunctions/analysis.

FIG. 4A illustrates an example interface 400 that provides assistance tousers in using a function. For example, the interface 400 may provideassistance to users in using Function A. Function A may take twoarguments (A-1, A-2) to perform its operation(s). Based on users'selection of Function A, the interface 400 may be presented to receiveusers' selection of data/portion of data for the arguments of FunctionA. For example, the Argument A-1 section 402 may receive users'selection of a portion of data (e.g., column, row) to be used for theargument A-1. The Argument A-1 section 402 may display suggestions forusers' selection of the portion of the data in a box 404. For example,the box 404 may display the text of Variable A-1 that represent theargument A-1—based on the Variable A-1 being coded as “speed,” the box404 may display the text “speed.” Displaying such information may enableusers to select the appropriate portion of the data (e.g., column, row)to be used as the argument. As another example, the box 404 may displayspecific value (e.g., number, text) suggested for the argument. In someembodiments, the box 404 may include a search field enabling users tosearch for particular portions of the data (e.g., based on column/rownames/properties). The box 406 may provide a listing of portions of data(e.g., column, row) which may be selected to be used for the argument.The portions of the data may be listed based on the argument. Forexample, based on the argument A-1 being associated with number typedata may result in number type columns/rows being listed in the box 406.The portions of the data may listed be based on user input. For example,the portions of the data listed may be those portions matching userinputs into the search field. In some embodiments, the listing ofportions of the data in the box 406 may be ordered/ranked to suggestparticular portion(s) for selection by users.

Responsive to the user's selection of a given portion of data using theArgument A-1 section 402, the Argument A-2 section 412 may be displayedto receive users' selection of a portion of data to be used for theargument A-2. The Argument A-2 section 412 may operate for argument A-2as the Argument A-1 section 402 operates as described above for argumentA-1. The box 414 may operate for argument A-2 as the box 404 operates asdescribed above for argument A-1. The box 416 may operate for argumentA-2 as the box 406 operates as described above for argument A-1. In someembodiments, one or more of the boxes 414, 416 may operate differentlybased on users' selection of data for argument A-1 using the ArgumentA-1 section 402. For example, based on users' selection of differentcolumns/rows (e.g., columns/rows of different types of data) for theArgument A-1, the boxes 414, 416 may operate differently to providedifferent suggestions.

FIG. 4B illustrates an example interface 430 that provides assistance tousers in generating a new combined function from multiple functions. Apipeline information section 420 may display functions (function A 422,function B 424, function C 426) selected by users for inclusion in apipeline of operations. Users may select two or more of the functions422, 424, 426 and select an option to create a new combined functionfrom the selected function. Based on users' selection to create a newcombined function from existing functions, the interface 430 may bedisplayed. The interface 430 may include a name field 432 enabling usersto provide a name/title for the new combined function. The functionproperties section 434 may display information relating to properties ofthe selected functions. For example, based on a user's selection of thefunction B 424 and the function C 426 for inclusion in a new combinedfunction, the function properties section 434 may display properties ofthe selected functions, such as variables of the functions (variable B,variable C-1, variable C-2) used to define arguments for the functions.The function properties section 434 may provide options 436 enabling theuser to modify the variables (e.g., determine whether a given argumentwill be set to a specific value in the new combined function; determinewhether the given argument will be selectable by users when using thenew combined function; determine whether to change the variable/argumentin the new combined function).

FIG. 5 illustrates a flowchart of an example method 500, according tovarious embodiments of the present disclosure. The method 500 may beimplemented in various environments including, for example, theenvironment 100 of FIG. 1 . The operations of method 500 presented beloware intended to be illustrative. Depending on the implementation, theexample method 500 may include additional, fewer, or alternative stepsperformed in various orders or in parallel. The example method 500 maybe implemented in various computing systems or devices including one ormore processors.

At block 502, data may be accessed. At block 504, a set of functions forthe data may be provided. At block 506, a user's selection of one ormore functions from the set of functions may be received. At block 508,a pipeline of operations for the data may be generated based on theuser's selection. The pipeline of operations may include the function(s)selected by the user. At block 510, a user's grouping of multiplefunctions from the set of functions may be received. At block 512, a newfunction may be generated based on the multiple functions. At block 514,a change to a given function from the set of functions may be received.At block 516, a dependency graph may be generated for the change to thegiven function. At block 518, information regarding an impact of thechange to the given function on one or more related functions may beprovided. At block 520, one or more related functions may be changedbased on the change to the given function.

Hardware Implementation

The techniques described herein are implemented by one or morespecial-purpose computing devices. The special-purpose computing devicesmay be hard-wired to perform the techniques, or may include circuitry ordigital electronic devices such as one or more application-specificintegrated circuits (ASICs) or field programmable gate arrays (FPGAs)that are persistently programmed to perform the techniques, or mayinclude one or more hardware processors programmed to perform thetechniques pursuant to program instructions in firmware, memory, otherstorage, or a combination. Such special-purpose computing devices mayalso combine custom hard-wired logic, ASICs, or FPGAs with customprogramming to accomplish the techniques. The special-purpose computingdevices may be desktop computer systems, server computer systems,portable computer systems, handheld devices, networking devices or anyother device or combination of devices that incorporate hard-wiredand/or program logic to implement the techniques.

Computing device(s) are generally controlled and coordinated byoperating system software, such as iOS, Android, Chrome OS, Windows XP,Windows Vista, Windows 7, Windows 8, Windows Server, Windows CE, Unix,Linux, SunOS, Solaris, iOS, Blackberry OS, VxWorks, or other compatibleoperating systems. In other embodiments, the computing device may becontrolled by a proprietary operating system. Conventional operatingsystems control and schedule computer processes for execution, performmemory management, provide file system, networking, I/O services, andprovide a user interface functionality, such as a graphical userinterface (“GUI”), among other things.

FIG. 6 is a block diagram that illustrates a computer system 600 uponwhich any of the embodiments described herein may be implemented. Thecomputer system 600 includes a bus 602 or other communication mechanismfor communicating information, one or more hardware processors 604coupled with bus 602 for processing information. Hardware processor(s)604 may be, for example, one or more general purpose microprocessors.

The computer system 600 also includes a main memory 606, such as arandom access memory (RAM), cache and/or other dynamic storage devices,coupled to bus 602 for storing information and instructions to beexecuted by processor 604. Main memory 606 also may be used for storingtemporary variables or other intermediate information during executionof instructions to be executed by processor 604. Such instructions, whenstored in storage media accessible to processor 604, render computersystem 600 into a special-purpose machine that is customized to performthe operations specified in the instructions.

The computer system 600 further includes a read only memory (ROM) 608 orother static storage device coupled to bus 602 for storing staticinformation and instructions for processor 604. A storage device 610,such as a magnetic disk, optical disk, or USB thumb drive (Flash drive),etc., is provided and coupled to bus 602 for storing information andinstructions.

The computer system 600 may be coupled via bus 602 to a display 612,such as a cathode ray tube (CRT) or LCD display (or touch screen), fordisplaying information to a computer user. An input device 614,including alphanumeric and other keys, is coupled to bus 602 forcommunicating information and command selections to processor 604.Another type of user input device is cursor control 616, such as amouse, a trackball, or cursor direction keys for communicating directioninformation and command selections to processor 604 and for controllingcursor movement on display 612. This input device typically has twodegrees of freedom in two axes, a first axis (e.g., x) and a second axis(e.g., y), that allows the device to specify positions in a plane. Insome embodiments, the same direction information and command selectionsas cursor control may be implemented via receiving touches on a touchscreen without a cursor.

The computing system 600 may include a user interface module toimplement a GUI that may be stored in a mass storage device asexecutable software code that are executed by the computing device(s).This and other modules may include, by way of example, components, suchas software components, object-oriented software components, classcomponents and task components, processes, functions, attributes,procedures, subroutines, segments of program code, drivers, firmware,microcode, circuitry, data, databases, data structures, tables, arrays,and variables.

In general, the word “module,” as used herein, refers to logic embodiedin hardware or firmware, or to a collection of software instructions,possibly having entry and exit points, written in a programminglanguage, such as, for example, Java, C or C++. A software module may becompiled and linked into an executable program, installed in a dynamiclink library, or may be written in an interpreted programming languagesuch as, for example, BASIC, Perl, or Python. It will be appreciatedthat software modules may be callable from other modules or fromthemselves, and/or may be invoked in response to detected events orinterrupts. Software modules configured for execution on computingdevices may be provided on a computer readable medium, such as a compactdisc, digital video disc, flash drive, magnetic disc, or any othertangible medium, or as a digital download (and may be originally storedin a compressed or installable format that requires installation,decompression or decryption prior to execution). Such software code maybe stored, partially or fully, on a memory device of the executingcomputing device, for execution by the computing device. Softwareinstructions may be embedded in firmware, such as an EPROM. It will befurther appreciated that hardware modules may be comprised of connectedlogic units, such as gates and flip-flops, and/or may be comprised ofprogrammable units, such as programmable gate arrays or processors. Themodules or computing device functionality described herein arepreferably implemented as software modules, but may be represented inhardware or firmware. Generally, the modules described herein refer tological modules that may be combined with other modules or divided intosub-modules despite their physical organization or storage.

The computer system 600 may implement the techniques described hereinusing customized hard-wired logic, one or more ASICs or FPGAs, firmwareand/or program logic which in combination with the computer systemcauses or programs computer system 600 to be a special-purpose machine.According to one embodiment, the techniques herein are performed bycomputer system 600 in response to processor(s) 604 executing one ormore sequences of one or more instructions contained in main memory 606.Such instructions may be read into main memory 606 from another storagemedium, such as storage device 610. Execution of the sequences ofinstructions contained in main memory 606 causes processor(s) 604 toperform the process steps described herein. In alternative embodiments,hard-wired circuitry may be used in place of or in combination withsoftware instructions.

The term “non-transitory media,” and similar terms, as used hereinrefers to any media that store data and/or instructions that cause amachine to operate in a specific fashion. Such non-transitory media maycomprise non-volatile media and/or volatile media. Non-volatile mediaincludes, for example, optical or magnetic disks, such as storage device610. Volatile media includes dynamic memory, such as main memory 606.Common forms of non-transitory media include, for example, a floppydisk, a flexible disk, hard disk, solid state drive, magnetic tape, orany other magnetic data storage medium, a CD-ROM, any other optical datastorage medium, any physical medium with patterns of holes, a RAM, aPROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip orcartridge, and networked versions of the same.

Non-transitory media is distinct from but may be used in conjunctionwith transmission media. Transmission media participates in transferringinformation between non-transitory media. For example, transmissionmedia includes coaxial cables, copper wire and fiber optics, includingthe wires that comprise bus 602. Transmission media can also take theform of acoustic or light waves, such as those generated duringradio-wave and infra-red data communications.

Various forms of media may be involved in carrying one or more sequencesof one or more instructions to processor 604 for execution. For example,the instructions may initially be carried on a magnetic disk or solidstate drive of a remote computer. The remote computer can load theinstructions into its dynamic memory and send the instructions over atelephone line using a modem. A modem local to computer system 600 canreceive the data on the telephone line and use an infra-red transmitterto convert the data to an infra-red signal. An infra-red detector canreceive the data carried in the infra-red signal and appropriatecircuitry can place the data on bus 602. Bus 602 carries the data tomain memory 606, from which processor 604 retrieves and executes theinstructions. The instructions received by main memory 606 may retrievesand executes the instructions. The instructions received by main memory606 may optionally be stored on storage device 610 either before orafter execution by processor 604.

The computer system 600 also includes a communication interface 618coupled to bus 602. Communication interface 618 provides a two-way datacommunication coupling to one or more network links that are connectedto one or more local networks. For example, communication interface 618may be an integrated services digital network (ISDN) card, cable modem,satellite modem, or a modem to provide a data communication connectionto a corresponding type of telephone line. As another example,communication interface 618 may be a local area network (LAN) card toprovide a data communication connection to a compatible LAN (or WANcomponent to communicated with a WAN). Wireless links may also beimplemented. In any such implementation, communication interface 618sends and receives electrical, electromagnetic or optical signals thatcarry digital data streams representing various types of information.

A network link typically provides data communication through one or morenetworks to other data devices. For example, a network link may providea connection through local network to a host computer or to dataequipment operated by an Internet Service Provider (ISP). The ISP inturn provides data communication services through the world wide packetdata communication network now commonly referred to as the “Internet”.Local network and Internet both use electrical, electromagnetic oroptical signals that carry digital data streams. The signals through thevarious networks and the signals on network link and throughcommunication interface 618, which carry the digital data to and fromcomputer system 600, are example forms of transmission media.

The computer system 600 can send messages and receive data, includingprogram code, through the network(s), network link and communicationinterface 618. In the Internet example, a server might transmit arequested code for an application program through the Internet, the ISP,the local network and the communication interface 618.

The received code may be executed by processor 604 as it is received,and/or stored in storage device 610, or other non-volatile storage forlater execution.

Each of the processes, methods, and algorithms described in thepreceding sections may be embodied in, and fully or partially automatedby, code modules executed by one or more computer systems or computerprocessors comprising computer hardware. The processes and algorithmsmay be implemented partially or wholly in application-specificcircuitry.

The various features and processes described above may be usedindependently of one another, or may be combined in various ways. Allpossible combinations and sub-combinations are intended to fall withinthe scope of this disclosure. In addition, certain method or processblocks may be omitted in some implementations. The methods and processesdescribed herein are also not limited to any particular sequence, andthe blocks or states relating thereto can be performed in othersequences that are appropriate. For example, described blocks or statesmay be performed in an order other than that specifically disclosed, ormultiple blocks or states may be combined in a single block or state.The example blocks or states may be performed in serial, in parallel, orin some other manner. Blocks or states may be added to or removed fromthe disclosed example embodiments. The example systems and componentsdescribed herein may be configured differently than described. Forexample, elements may be added to, removed from, or rearranged comparedto the disclosed example embodiments.

Conditional language, such as, among others, “can,” “could,” “might,” or“may,” unless specifically stated otherwise, or otherwise understoodwithin the context as used, is generally intended to convey that certainembodiments include, while other embodiments do not include, certainfeatures, elements and/or steps. Thus, such conditional language is notgenerally intended to imply that features, elements and/or steps are inany way required for one or more embodiments or that one or moreembodiments necessarily include logic for deciding, with or without userinput or prompting, whether these features, elements and/or steps areincluded or are to be performed in any particular embodiment.

Any process descriptions, elements, or blocks in the flow diagramsdescribed herein and/or depicted in the attached figures should beunderstood as potentially representing modules, segments, or portions ofcode which include one or more executable instructions for implementingspecific logical functions or steps in the process. Alternateimplementations are included within the scope of the embodimentsdescribed herein in which elements or functions may be deleted, executedout of order from that shown or discussed, including substantiallyconcurrently or in reverse order, depending on the functionalityinvolved, as would be understood by those skilled in the art.

It should be emphasized that many variations and modifications may bemade to the above-described embodiments, the elements of which are to beunderstood as being among other acceptable examples. All suchmodifications and variations are intended to be included herein withinthe scope of this disclosure. The foregoing description details certainembodiments of the invention. It will be appreciated, however, that nomatter how detailed the foregoing appears in text, the invention can bepracticed in many ways. As is also stated above, it should be noted thatthe use of particular terminology when describing certain features oraspects of the invention should not be taken to imply that theterminology is being re-defined herein to be restricted to including anyspecific characteristics of the features or aspects of the inventionwith which that terminology is associated. The scope of the inventionshould therefore be construed in accordance with the appended claims andany equivalents thereof.

Engines, Components, and Logic

Certain embodiments are described herein as including logic or a numberof components, engines, or mechanisms. Engines may constitute eithersoftware engines (e.g., code embodied on a machine-readable medium) orhardware engines. A “hardware engine” is a tangible unit capable ofperforming certain operations and may be configured or arranged in acertain physical manner. In various example embodiments, one or morecomputer systems (e.g., a standalone computer system, a client computersystem, or a server computer system) or one or more hardware engines ofa computer system (e.g., a processor or a group of processors) may beconfigured by software (e.g., an application or application portion) asa hardware engine that operates to perform certain operations asdescribed herein.

In some embodiments, a hardware engine may be implemented mechanically,electronically, or any suitable combination thereof. For example, ahardware engine may include dedicated circuitry or logic that ispermanently configured to perform certain operations. For example, ahardware engine may be a special-purpose processor, such as aField-Programmable Gate Array (FPGA) or an Application SpecificIntegrated Circuit (ASIC). A hardware engine may also includeprogrammable logic or circuitry that is temporarily configured bysoftware to perform certain operations. For example, a hardware enginemay include software executed by a general-purpose processor or otherprogrammable processor. Once configured by such software, hardwareengines become specific machines (or specific components of a machine)uniquely tailored to perform the configured functions and are no longergeneral-purpose processors. It will be appreciated that the decision toimplement a hardware engine mechanically, in dedicated and permanentlyconfigured circuitry, or in temporarily configured circuitry (e.g.,configured by software) may be driven by cost and time considerations.

Accordingly, the phrase “hardware engine” should be understood toencompass a tangible entity, be that an entity that is physicallyconstructed, permanently configured (e.g., hardwired), or temporarilyconfigured (e.g., programmed) to operate in a certain manner or toperform certain operations described herein. As used herein,“hardware-implemented engine” refers to a hardware engine. Consideringembodiments in which hardware engines are temporarily configured (e.g.,programmed), each of the hardware engines need not be configured orinstantiated at any one instance in time. For example, where a hardwareengine comprises a general-purpose processor configured by software tobecome a special-purpose processor, the general-purpose processor may beconfigured as respectively different special-purpose processors (e.g.,comprising different hardware engines) at different times. Softwareaccordingly configures a particular processor or processors, forexample, to constitute a particular hardware engine at one instance oftime and to constitute a different hardware engine at a differentinstance of time.

Hardware engines can provide information to, and receive informationfrom, other hardware engines. Accordingly, the described hardwareengines may be regarded as being communicatively coupled. Where multiplehardware engines exist contemporaneously, communications may be achievedthrough signal transmission (e.g., over appropriate circuits and buses)between or among two or more of the hardware engines. In embodiments inwhich multiple hardware engines are configured or instantiated atdifferent times, communications between such hardware engines may beachieved, for example, through the storage and retrieval of informationin memory structures to which the multiple hardware engines have access.For example, one hardware engine may perform an operation and store theoutput of that operation in a memory device to which it iscommunicatively coupled. A further hardware engine may then, at a latertime, access the memory device to retrieve and process the storedoutput. Hardware engines may also initiate communications with input oroutput devices, and can operate on a resource (e.g., a collection ofinformation).

The various operations of example methods described herein may beperformed, at least partially, by one or more processors that aretemporarily configured (e.g., by software) or permanently configured toperform the relevant operations. Whether temporarily or permanentlyconfigured, such processors may constitute processor-implemented enginesthat operate to perform one or more operations or functions describedherein. As used herein, “processor-implemented engine” refers to ahardware engine implemented using one or more processors.

Similarly, the methods described herein may be at least partiallyprocessor-implemented, with a particular processor or processors beingan example of hardware. For example, at least some of the operations ofa method may be performed by one or more processors orprocessor-implemented engines. Moreover, the one or more processors mayalso operate to support performance of the relevant operations in a“cloud computing” environment or as a “software as a service” (SaaS).For example, at least some of the operations may be performed by a groupof computers (as examples of machines including processors), with theseoperations being accessible via a network (e.g., the Internet) and viaone or more appropriate interfaces (e.g., an Application ProgramInterface (API)).

The performance of certain of the operations may be distributed amongthe processors, not only residing within a single machine, but deployedacross a number of machines. In some example embodiments, the processorsor processor-implemented engines may be located in a single geographiclocation (e.g., within a home environment, an office environment, or aserver farm). In other example embodiments, the processors orprocessor-implemented engines may be distributed across a number ofgeographic locations.

Language

Throughout this specification, plural instances may implementcomponents, operations, or structures described as a single instance.Although individual operations of one or more methods are illustratedand described as separate operations, one or more of the individualoperations may be performed concurrently, and nothing requires that theoperations be performed in the order illustrated. Structures andfunctionality presented as separate components in example configurationsmay be implemented as a combined structure or component. Similarly,structures and functionality presented as a single component may beimplemented as separate components. These and other variations,modifications, additions, and improvements fall within the scope of thesubject matter herein.

Although an overview of the subject matter has been described withreference to specific example embodiments, various modifications andchanges may be made to these embodiments without departing from thebroader scope of embodiments of the present disclosure. Such embodimentsof the subject matter may be referred to herein, individually orcollectively, by the term “invention” merely for convenience and withoutintending to voluntarily limit the scope of this application to anysingle disclosure or concept if more than one is, in fact, disclosed.

The embodiments illustrated herein are described in sufficient detail toenable those skilled in the art to practice the teachings disclosed.Other embodiments may be used and derived therefrom, such thatstructural and logical substitutions and changes may be made withoutdeparting from the scope of this disclosure. The Detailed Description,therefore, is not to be taken in a limiting sense, and the scope ofvarious embodiments is defined only by the appended claims, along withthe full range of equivalents to which such claims are entitled.

It will be appreciated that an “engine,” “system,” “data store,” and/or“database” may comprise software, hardware, firmware, and/or circuitry.In one example, one or more software programs comprising instructionscapable of being executable by a processor may perform one or more ofthe functions of the engines, data stores, databases, or systemsdescribed herein. In another example, circuitry may perform the same orsimilar functions. Alternative embodiments may comprise more, less, orfunctionally equivalent engines, systems, data stores, or databases, andstill be within the scope of present embodiments. For example, thefunctionality of the various systems, engines, data stores, and/ordatabases may be combined or divided differently.

“Open source” software is defined herein to be source code that allowsdistribution as source code as well as compiled form, with awell-publicized and indexed means of obtaining the source, optionallywith a license that allows modifications and derived works.

The data stores described herein may be any suitable structure (e.g., anactive database, a relational database, a self-referential database, atable, a matrix, an array, a flat file, a documented-oriented storagesystem, a non-relational No-SQL system, and the like), and may becloud-based or otherwise.

As used herein, the term “or” may be construed in either an inclusive orexclusive sense. Moreover, plural instances may be provided forresources, operations, or structures described herein as a singleinstance. Additionally, boundaries between various resources,operations, engines, engines, and data stores are somewhat arbitrary,and particular operations are illustrated in a context of specificillustrative configurations. Other allocations of functionality areenvisioned and may fall within a scope of various embodiments of thepresent disclosure. In general, structures and functionality presentedas separate resources in the example configurations may be implementedas a combined structure or resource. Similarly, structures andfunctionality presented as a single resource may be implemented asseparate resources. These and other variations, modifications,additions, and improvements fall within a scope of embodiments of thepresent disclosure as represented by the appended claims. Thespecification and drawings are, accordingly, to be regarded in anillustrative rather than a restrictive sense.

Conditional language, such as, among others, “can,” “could,” “might,” or“may,” unless specifically stated otherwise, or otherwise understoodwithin the context as used, is generally intended to convey that certainembodiments include, while other embodiments do not include, certainfeatures, elements and/or steps. Thus, such conditional language is notgenerally intended to imply that features, elements and/or steps are inany way required for one or more embodiments or that one or moreembodiments necessarily include logic for deciding, with or without userinput or prompting, whether these features, elements and/or steps areincluded or are to be performed in any particular embodiment.

Although the invention has been described in detail for the purpose ofillustration based on what is currently considered to be the mostpractical and preferred implementations, it is to be understood thatsuch detail is solely for that purpose and that the invention is notlimited to the disclosed implementations, but, on the contrary, isintended to cover modifications and equivalent arrangements that arewithin the spirit and scope of the appended claims. For example, it isto be understood that the present invention contemplates that, to theextent possible, one or more features of any embodiment can be combinedwith one or more features of any other embodiment.

1. A system comprising: one or more processors; memory storinginstructions that, when executed by the one or more processors, causethe system to perform: providing access to a set of functions, eachfunction configured to perform one or more operations on data, whereinthe set of functions comprise a first function and a second functionthat references or calls the first function; receiving a modification ofthe first function; providing options to confirm the modification of thefirst function, reverse the modification of the first function, or tostore the modification as a new function; receiving an indication ofwhether to confirm or reverse the modification of the first function, orto store the modification as a new function; and selectively modify thefirst function and the second function according to the indication. 2.The system of claim 1, wherein the selectively modifying of the firstfunction and the second function comprises: in response to receiving anindication to store the modification as a new function, refraining frommodifying the second function.
 3. The system of claim 1, wherein themodification of the first function comprises a change in codecorresponding to the first function.
 4. The system of claim 1, whereinthe modification of the first function comprises a change in arestriction of a data type of an argument for the first function.
 5. Thesystem of claim 4, wherein the instructions further cause the system toperform: providing a listing of portions of the data as potentialarguments based on the change in the restriction of the data type; andranking the potential arguments.
 6. The system of claim 1, wherein themodification of the first function comprises a change in a dataframe ofthe first function.
 7. The system of claim 1, wherein the instructionsthat, when executed by the one or more processors, cause the system toperform: communicating an impact of the modification of the firstfunction on the second function, wherein the impact comprises aresulting change in data operations of the second function.
 8. Thesystem of claim 1, wherein the providing access to the set of functionscomprises suggesting the set of functions based on the data or ahistorical usage of the set of functions and the suggesting the set offunctions comprises ranking the set of functions according to ahistorical likelihood of use for a data type of the data or a frequencyof prior use of the set of functions.
 9. The system of claim 8, whereinsuggesting the set of functions comprises suggesting one or moreparameters for at least one function.
 10. The system of claim 1, whereinthe data is first data, and wherein execution of the instructionsfurther cause the system to perform: determining that second data ishistorically used with the first data by the first function at least athreshold number of times; and suggesting, via the pipeline creationinterface, that the second data be used as an argument of the firstfunction.
 11. A method, comprising: providing access to a set offunctions, each function configured to perform one or more operations ondata, wherein the set of functions comprise a first function and asecond function that references or calls the first function; receiving amodification of the first function; providing options to confirm themodification of the first function, reverse the modification of thefirst function, or to store the modification as a new function;receiving an indication of whether to confirm or reverse themodification of the first function, or to store the modification as anew function; and selectively modify the first function and the secondfunction according to the indication.
 12. The method of claim 11,wherein the selectively modifying of the first function and the secondfunction comprises: in response to receiving an indication to store themodification as a new function, refraining from modifying the secondfunction.
 13. The method of claim 11, wherein the modification of thefirst function comprises a change in code corresponding to the firstfunction.
 14. The method of claim 11, wherein the modification of thefirst function comprises a change in a restriction of a data type of anargument for the first function.
 15. The method of claim 14, furthercomprising: providing a listing of portions of the data as potentialarguments based on the change in the restriction of the data type; andranking the potential arguments.
 16. The method of claim 11, wherein themodification of the first function comprises a change in a dataframe ofthe first function.
 17. The method of claim 11, further comprising:communicating an impact of the modification of the first function on thesecond function, wherein the impact comprises a resulting change in dataoperations of the second function.
 18. The method of claim 11, whereinthe providing access to the set of functions comprises suggesting theset of functions based on the data or a historical usage of the set offunctions and the suggesting the set of functions comprises ranking theset of functions according to a historical likelihood of use for a datatype of the data or a frequency of prior use of the set of functions.19. The method of claim 18, wherein suggesting the set of functionscomprises suggesting one or more parameters for at least one function.20. A non-transitory computer readable medium comprising instructionsthat, when executed, cause one or more processors to perform: providingaccess to a set of functions, each function configured to perform one ormore operations on data, wherein the set of functions comprise a firstfunction and a second function that references or calls the firstfunction; receiving a modification of the first function; providingoptions to confirm the modification of the first function, reverse themodification of the first function, or to store the modification as anew function; receiving an indication of whether to confirm or reversethe modification of the first function, or to store the modification asa new function; and selectively modify the first function and the secondfunction according to the indication.